An Ensemble Approach to Instance-Based Regression Using Stretched Neighborhoods

نویسندگان

  • Vahid Jalali
  • David B. Leake
چکیده

Instance-based regression methods generate solutions from prior solutions within a neighborhood of the input query. Their performance depends on both neighborhood selection criteria and on the method for generating new solutions from the values of prior instances. This paper proposes a new approach to addressing both problems, in which solutions are generated by an ensemble of solutions of local linear regression models built for a collection of “stretched” neighborhoods of the query. Each neighborhood is generated by relaxing a different dimension of the problem space. The rationale is to enable major change trends along that dimension to have increased influence on the corresponding model. The approach is evaluated for two candidate relaxation approaches, gradient-based and based on fixed profiles, and compared to baselines of k-NN and using a radiusbased spherical neighborhood in n-dimensional space. Results in four test domains show up to 15 percent improvement over baselines, and suggest that the approach could be particularly useful in domains for which the space of prior instances is sparse. Introduction Lazy learning methods postpone building a model or making an estimation for the target function until a query is submitted, generating local estimates tailored the specific input problems. Lazy learning can be especially beneficial for complex and incomplete domains in which a set of (possibly relatively simpler) local models may provide higher quality results than a single global model. The notion of locality in lazy learning is often defined based on a distance function that is used for finding a set of nearest neighbors for the input query, from whose values a value is computed for the input query. Normally the neighborhood is determined by the distance function and a predefined number of nearby instances to consider: k-NN considers the k nearest neighbors. However, other neighborhood selection schemes and combination functions may sometimes be more appropriate. In this paper, we explore an approach using new neighborhood selection functions for choosing points from which to calculate Copyright c © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. a target value, using linear regression to develop local models, and then combining the values of those models to produce a value. From the perspective of case-based reasoning, this corresponds to a new approach to adapting the solutions of prior cases for regression tasks. The method trains linear regression models for local neighborhoods of the input query, in which the nearest neighbor calculation is adapted to “stretch” one dimension. More specifically, for an n-dimensional space, distances are computed by a normal domain similarity metric in n-1 dimensions, and distances in one dimension are relaxed according to either a gradient-based strategy or a strategy based on a fixed “shape” profile. By generating a model for a set neighborhoods, each relaxing of one of the n possible dimensions, n different linear regression models are generated for the specific query. The final value is generated by averaging the estimate returned by each of the models. We first present the approach and then evaluate its performance for four sample domains, showing encouraging results. We close with observations and topics for future research. Related Work Many instance-based learning approaches use k-NN, selecting neighborhoods composed of the k nearest instances. The effects of neighborhood shape have received little study. Outside of instance-based learning, numerous approaches have been explored for regression tasks. For example, Kwok and Yeung (yau Kwok and Yeung 1997) apply feed-forward neural networks, Orr (Orr 1996) applies radial basis function networks, and Scholkopf and Smola (Scholkopf and Smola 2001) apply Support Vector Machines by transforming the regression problem into a constrained optimization problem. These methods differ from our work both in the utilized models and the non-lazy nature of the model generation. Within case-based reasoning, McSherry (McSherry 1998) proposes a case-based reasoning regression approach based on pair-wise comparisons between cases in a case-base and using those pairs for adapting the solution from a retrieved case for an input query. Most relevant to our approach, Patterson et. al. (Patterson, Rooney, and Galushka 2002) train a locally weighted regression model for predicting the difference in the target value of two cases. They use a distance weighted average for creating a generalized case from the top k nearest neighbors to the input query, and adapt the soProceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Ensemble Approach for Anomaly Detection in Wireless Sensor Networks Using Time-overlapped Sliding Windows

One of the most important issues concerning the sensor data in the Wireless Sensor Networks (WSNs) is the unexpected data which are acquired from the sensors. Today, there are numerous approaches for detecting anomalies in the WSNs, most of which are based on machine learning methods. In this research, we present a heuristic method based on the concept of “ensemble of classifiers” of data minin...

متن کامل

Predicting distribution of Eurasian Lynx (Lynx lynx) using an ensemble modeling approach: A Case Study: Saveh Zarandieh Kharaghan Area, Markazi Province

Adequate knowledge about suitable habitats for wildlife is essential to prevent habitat destruction and extinction of species and for their conservation and management. The Eurasian lynx is one of the mostly distributed cats in Asia. In this study, we applied an ensemble habitat suitability modeling approach, using ten predictor variables to model Eurasian Lynx’s habitat suitability in Saveh Za...

متن کامل

Ensemble of M5 Model Tree Based Modelling of Sodium Adsorption Ratio

This work reports the results of four ensemble approaches with the M5 model tree as the base regression model to anticipate Sodium Adsorption Ratio (SAR). Ensemble methods that combine the output of multiple regression models have been found to be more accurate than any of the individual models making up the ensemble. In this study additive boosting, bagging, rotation forest and random subspace...

متن کامل

Monitoring of Regional Low-Flow Frequency Using Artificial Neural Networks

Ecosystem of arid and semiarid regions of the world, much of the country lies in the sensitive and fragile environment Canvases are that factors in the extinction and destruction are easily destroyed in this paper, artificial neural networks (ANNs) are introduced to obtain improved regional low-flow estimates at ungauged sites. A multilayer perceptron (MLP) network is used to identify the funct...

متن کامل

An Ensemble Approach to Adaptation-Guided Retrieval

Instance-based learning methods predict the solution of a case from the solutions of similar cases. However, solutions can be generated from less similar cases as well, provided appropriate “case adaptation” rules are available to adjust the prior solutions to account for dissimilarities. In fact, case-based reasoning research on adaptation-guided retrieval (AGR) shows that it may be beneficial...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013